AITopics | interspeech 2022

Collaborating Authors

interspeech 2022

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

Close, George, Hain, Thomas, Goetze, Stefan

arXiv.org Artificial IntelligenceDec-7-2023

Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individuals

listener, representation, sssr, (15 more...)

arXiv.org Artificial Intelligence

2307.13423

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > Scotland (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Otolaryngology (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

He, Mutian, Garner, Philip N.

arXiv.org Artificial IntelligenceAug-17-2023

Recently, large pretrained language models have demonstrated strong language understanding capabilities. This is particularly reflected in their zero-shot and in-context learning abilities on downstream tasks through prompting. To assess their impact on spoken language understanding (SLU), we evaluate several such models like ChatGPT and OPT of different sizes on multiple benchmarks. We verify the emergent ability unique to the largest models as they can reach intent classification accuracy close to that of supervised models with zero or few shots on various languages given oracle transcripts. By contrast, the results for smaller models fitting a single GPU fall far behind. We note that the error cases often arise from the annotation scheme of the dataset; responses from ChatGPT are still reasonable. We show, however, that the model is worse at slot filling, and its performance is sensitive to ASR errors, suggesting serious challenges for the application of those textual models on SLU.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.13512

Country:

Europe > United Kingdom > England (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defense Against Adversarial Attacks on Audio DeepFake Detection

Kawa, Piotr, Plata, Marcin, Syga, Piotr

arXiv.org Artificial IntelligenceJun-10-2023

Audio DeepFakes (DF) are artificially generated utterances created using deep learning, with the primary aim of fooling the listeners in a highly convincing manner. Their quality is sufficient to pose a severe threat in terms of security and privacy, including the reliability of news or defamation. Multiple neural network-based methods to detect generated speech have been proposed to prevent the threats. In this work, we cover the topic of adversarial attacks, which decrease the performance of detectors by adding superficial (difficult to spot by a human) changes to input data. Our contribution contains evaluating the robustness of 3 detection architectures against adversarial attacks in two scenarios (white-box and using transferability) and enhancing it later by using adversarial training performed by our novel adaptive training. Moreover, one of the investigated architectures is RawNet3, which, to the best of our knowledge, we adapted for the first time to DeepFake detection.

artificial intelligence, machine learning, robustness, (18 more...)

arXiv.org Artificial Intelligence

2212.14597

Country:

Europe > Poland > Lower Silesia Province > Wroclaw (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Study on the Reliability of Automatic Dysarthric Speech Assessments

Cadet, Xavier F., Aloufi, Ranya, Ahmadi-Abhari, Sara, Haddadi, Hamed

arXiv.org Artificial IntelligenceJun-7-2023

Automating dysarthria assessments offers the opportunity to develop effective, low-cost tools that address the current limitations of manual and subjective assessments. Nonetheless, it is unclear whether current approaches rely on dysarthria-related speech patterns or external factors. We aim toward obtaining a clearer understanding of dysarthria patterns. To this extent, we study the effects of noise in recordings, both through addition and reduction. We design and implement a new method for visualizing and comparing feature extractors and models, at a patient level, in a more interpretable way. We use the UA-Speech dataset with a speaker-based split of the dataset. Results reported in the literature appear to have been done irrespective of such split, leading to models that may be overconfident due to data-leakage. We hope that these results raise awareness in the research community regarding the requirements for establishing reliable automatic dysarthria assessment systems.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.04337

Genre: Research Report (0.83)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

INTERSPEECH 2022 -- My First Conference Experience

#artificialintelligenceSep-30-2022, 09:55:07 GMT

Our company mainly focuses on building services to facilitate a better understanding of what event is taking place in the environmental sound scene (e.g. The conference on INTERSPEECH, one of the biggest conferences on the science and technology of spoken language processing, was held at Songdo ConvensiA, in Incheon, South Korea, from Sep. 18 to 22, 2022. Integrating two previous series of conferences (EUROSPEECH and ICSLP), the first INTERSPEECH was held in 2000, in Beijing. Since then, INTERSPEECH has gained popularity and held the 23rd event this year, 2022. Despite a small discrepancy between our main focus and the conference theme, because our company primarily concentrates on nonverbal audio signals other than speech itself, many papers from INTERSPEECH have aided our research so far.

conference experience, interspeech, interspeech 2022, (4 more...)

#artificialintelligence

Country:

Asia > South Korea > Incheon > Incheon (0.26)
Asia > China > Beijing > Beijing (0.26)
Europe > Ireland > Leinster > County Dublin > Dublin (0.06)
(2 more...)

Genre: Research Report > Promising Solution (0.32)

Technology:

Information Technology > Artificial Intelligence > Speech (0.72)
Information Technology > Artificial Intelligence > Natural Language (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback